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Summary. In this paper we consider the problem of model choice for a set of insurance loss ratios. We 
use a reversible jump algorithm for our model discrimination and show how the vanilla reversible jump 
algorithm can be improved on using recent methodological advances in reversible jump computation. 



■("Corresponding author, Email:gob20@statslab. cam. ac .uk 



2 



D.045 



D.040 



D.035 



D.030 - 



D.025 - 



D.020 




2 3 4 5 

Year 

Fig. 1. A plot of the loss ratios against year. 



1. Introduction 



In traditional insurance settings model selection and uncertainty are usually not treated, even though model 
selection problems have been a ctively researched in statistics. Recent ly this shortco ming has been addressed 
by several authors: ICairnsI 020001) in general insurance and risk theory ;|Keatinge ( 1999) in esti matin g the num- 
ber of components in a mixture of exponentials for estimating claims amounts; also lHarrisI {1999) considers 
the pro blem of model selection for vect or autoregression for financial time series. In the field of credibility 
theory, 



Biihlmann and Buhlmann 



( 1999) considers the selection of variables in certain regression credibility 



models. 



In this paper we consider the problem of parameter estimation and model selection in the analysis of 
workers' compensation loss ratios. This paper is motivated by the analysis of data consisting of workers' 
compensation loss ratios arising over a seven year period. The data are part of a set containing frequency 
counts on workers' compensation insurance. The number of claims against the workers' compensation insur- 
ance scheme is recorded, together with the corresponding exposure values. The exposures are scaled payroll 
totals and provide a measure of the size of the exposed group. 

The model we fit to the data is described in Section |2] we then introduce two additional models, both of 
which are sub-models of the first. Using the reversible jump method describe d in Section P] we di scriminate 



Brooks et al 



(2003) to derive 



between the three models. We also use the efficient proposals method of 
proposa ls for our rev ersible jump updates. The results are compared with the pilot-tuned vanilla reversible 
jump of ( 



Green 



(1995). 
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2. The Data and Model 



We denote the number of claims for year j by Li and the corresponding exposure values by Ej for j= 1 , . . . , n. 
For this particular dataset we have n — 1. The loss ratios which we propose to model are then defined as the 
number of losses per unit exposure and will be denoted by Rj, where Rj = Lj/Ej. Let R" denote the collective 
loss ratios and E" denote the collective exposure values. Here, we use a hierarchical normal model to describe 
the loss ratios, so that 



Ri 



• 1 («,.;cT/wi M./ I « 



(1) 



where oc ; - denotes so me underly i ng tim e- varying process which describes the progression of ratio level over 
time. Here, we follow lKlugmanl d 19921) and adopt the following model for the a, process 



aj~Jf (pa ; _i + (l-p)TJ,T V )j=l,. 



(2) 



For ao, p, and rj we use standard normal jV (0, 1) priors and for the precision (inverse variance) parameters 
a and x we use Gamma (a, b) priors. The literature provide s empirical evidence to support the introduction of 



this model for describing loss ratios dLedolter et al 



1991D and a simple plot of the data in Figure Q] confirms 



that the observed behaviour can be described by a model of this sort. However, one disadvantage of this 
model is that whereas the loss ratios are always non-negative, the normal model has support extending across 
the entire real line so that negative values could, in theory, occur. One way around this would be to restrict 
the normal model in (Q]i to loss ratios within the region [0,°°) and/or to impose similar restrictions on the 
(Xj process. These restrictions are very easily implemented as a trivial extension of the scheme we describe 
here but, since by adopting the more general model, serious failures in the ability of the model to describe 
the observed data can be detected when negative estimates are obtained, the more general model provides a 
useful check for the adequacy of our modelling scheme. 



3. A Gibbs Sampling Algorithm 

For the model presented in Equations (HJ and (01 we use Gibbs updates to obtain samples from the posterior 
distribution of the model parameters. The full joint posterior distribution of all the model parameters is given 
by 

n(a,z,a,p,ri\R\E")ocL{R"\a,a,E")p{a\p,ri,z,a )p(a)p{t)p(p)p(ri)p{aQ), 

where a denotes the collection (a,\ , • • • , 0£„), p{ ) denotes the prior distribution for the corresponding param- 
eter and the likelihood term 

L(R"\a,a,E n ) = flf(Rj\a,a,E n ). 

j=i 

We now derive the full posterior conditional for each of the model parameters in turn. These conditional dis- 
tributions will then be used to implement a Gibbs update algorithm. The full posterior conditional distribution 
for a is 



n(a\a)oc p (a)L(R n \a,a,E") 

« exp {-a (b + 1 i; j=l Ej(Rj - a] f) } , 



which is a gamma distribution with shape parameter a + | and scale parameter b + j Y!j=i Ej(Rj — <*/) 2 . The 
full posterior conditional for T is 

%{x\a,p,r])^ p{x)p{a\c^,p,r],T) 

^T^-iexpj-T^ + I^^a.-pa^j-Cl-p))]) 2 )}, 

which is a gamma distribution with shape parameter a + 1 and scale parameter b+ j Y!j=i { a j — P a j-\ — (1 — 
p)r\) 2 . The full posterior conditional distribution for p is 

7r(p|a,77,T)oc p(p)p(a\ao,p,ri) 



exp < 



(l + ^I"=l(T?-«y-l) 2 ) / T^ =1 (TJ-a ; -)(T?-a,-i) N 



l + ^ =1 (7?-a,-i) 2 



which is a normal distribution with mean 



and variance (1 + t£" =1 (t7 — a 7 _i) 2 ) -1 . The full posterior conditional distribution for 77 is 

ff(T}|a,p,T)ocp(T})p(a|ao,p,T}) 

/ l+«T(l-p) 2 / t(l -p)Fj=i(a } -paj-iy 

0= exp < — 77 



1+«t(1 -p) 2 



which is a normal distribution with mean 



(l+nT(l-p) 2 )- 1 (T(l-p)£; =1 (a ; --pa ; -_ 1 )), 
and variance (1+«t(1— p) 2 )~'. The full posterior conditional distribution for aj is 
n{aj\a {j) , p,ri, a, t)oc 

p(Uj)p(Uj+i\Uj,P,il,T) j = 

p{a j \a j - l ,p,r\, x)p{a j+ { \ aj , p , 77 , x)p {Rj \a h a) j = 1 , . . . , n - 1 
,/>(«/ 1 a/-i , P . n , t)p (fy 1 a, , a) 7 = « 

which is a normal distribution with mean rrij and variance Vj where 



( (pT(a /+ i-(l-p)77)) 

0+7^) 

(prjoy+i - (1 - p)?7) + T(pa i _i + (1 - p)rj) + 



7 = 



7 =«, 
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and 



1 



(1+P Z T) 
1 



(P 2 T+T- 
1 



oE, 



1. 



These conditional distributions are then used to simulate a dependent sample from the posterior distribution 
of the model parameters given the data R" by sampling each in turn within each iteration. All the posterior 
conditionals are standard distributions, hence there are no difficulties in simulating from them. We could 
construct a more general Metropolis type algorithm which updates the parameters a, p and rj at the same 
time. This would necessitate introducing an acceptance/rejection stage to ensure stationarity. 



3.1. Simulation Results 

The Gibbs model above was implemented with a = 0.001 and b = 0.001 so that the precision parameters a 
and T have vague, flat priors. The posterior means and 95% highest posterior density (HPD) intervals are 
shown in Table[T]and trace plots of the model parameters are shown in Figure[4] The 95% HPD interval is the 
smallest region of the parameter space which contains 95% of the posterior probability mass of the parameter. 
A plot of the marginal posterior of p reveals that its density is bimodal with one mode near and another at 
p = 1 . The posterior density of rho is shown in Figure|2] Even though the 95% HPD interval consists of only 
one interval, a corresponding 90% HPD interval is actually a union of two disjoint intervals, each containing 
one of the two modes. The effect of the bimodality of p can be seen by the wide intervals for the parameters 
Oo and 77 . 

A possible explanation is that the posterior conditional of Oo is the same as its prior density, since there 
is no need for Oo if p is identically 0. Similarly when p is close to the mode at 1, the conditional posterior 
of 77 is almost identical to its prior density. Consequently, these two parameters are being sampled from two 
distinct posterior densities corresponding to whether p is close to or 1 . The reason for the bimodality of p is 
not entirely clear, the model may be overparameterised since we are fitting 12 parameters to 7 data points. To 
observe the effect of the number of parameters we can reduce the effective number of parameters being fitted 
by integrating out the nuisance parameters a and T, then re-fitting the model and observing any differences. 
The results of this new implementations are identical to the first implementation with both a and T included, 
as we show in the next section. 



3.2. Integrating out the Variance Parameters 



Papaspiliopoulos et al.l (12003b shows that for Gaussian models similar to that described in Equations[T]and (O 



the convergence properties are largely determined by the values of the variance components. In this Section 
we redo the analysis, however this time we integrate out the variance parameters a and T. This results in 
fewer parameters to be estimated, but as the results show, it also increases the autocorrelation of the other 
parameters. Also the complexity of the model has been reduced. The form of the conditional posteriors, 
however, has been made more complex. We use a random walk Metropolis algorithm to simulate from the 
posterior distribution of the remaining unknown parameters Oo , Cti , . . . , a„ , p , rj . The results show that the 
posterior estimate of p is still bimodal. 



Table 1. Posterior means and 95% HPD 
Intervals for the model parameters with a = 
b = c = 0.001. 



r dldillCLCI 


CSLllllaLC 


HPD Tntprvnl 
7J /C nrU lllLCIVal 


ao 


0.0167 


(-1.1188, 1.0619) 


a t 


0.0256 


(-0.0311, 0.0817) 
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0.0246 


(-0.0211,0.0708) 
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Fig. 2. Posterior density of p. 
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If a ~ Gamma(ai,£>i) and X ~ Gamma (02, £>2), recall that in Section [3]we showed that the posterior 
conditionals of a and t are of the form of Gamma densities, and both are independent of each other, since 
the posterior conditional of a does not depend on T and likewise that of z does not depend on a. 

Since also the posterior conditionals are of standard form, we can integrate out these two parameters 
leaving a density involving only the other parameters. Using the fact that the posterior conditionals for a and 
T are standard Gamma densities, we can show that 



7t(a,ao,p,ri\R n ) = J 7t{a 1 a ,p,rj,a,T\R")dadT, 



so that 

n(a,ao,p,il\R")- p(p)p(t7)p(oo) x (h + ^Ejiaj -Rj) 2 y {a ' +n/2) x 

n2\-("2+«/2) 



(^ + ^K-P«;- 1 -(1-P) I) )T 1 " 2 " (3) 



where 7z(a, Cto,p, Tj\R") is the posterior density of a, OCq, p and tj given the data. 
Now given (0 the following posterior conditionals are readily observed 

% (p | a , oo , t? ) - p (p ) (b 2 + 3 £ ( ctj - P a h , - ( 1 - p ) 77 ) 2 ) ~ ( " 2+ " /2) , 

and 

7r(T7|a, ao,p) - p (n) (62 + j£(aj -po H - (1 -p)T7) 2 ) _( " 2+ " /2) . 

also 



7r(a,a |p,T7)oc P (a ) (b 1 + ^ E j( a J- R j) 2 



X 



(&2 + 3E(«; - P a /- 1 - ( 1 - P ) »J ) 2 ) 



2\-(«2+n/2) 



We use this scheme because our attempts to update p, 77 and a as one block using a 10-variate normal 
distribution centred at the current values did not work very well. 

These are all non-standard densities and to implement this model we used a Gibbs updating scheme with 
random walk Metropolis algorithms for r\, p and Oo with uniform distributions centred at the current values. 
For p, T] and Oo, the width of the proposal interval was determined by fine tuning an initial run until the 
acceptance rates were 0.27, 0.15 and 0.29, respectively. For 0£i , . . . , OC7 we used a 7-variable normal density 
as the proposal for a random walk Metropolis algorithm centred at the current values of these parameters. The 
covariance matrix for this proposal distribution was determined from an initial run from which we computed 
the covariance of the parameters 0C\, . . ., a-j. With this covariance matrix the acceptance ra te of the Metropolis 



algorithm is 0.15, this is smaller than would be ideal (Ro berts and Rosen thal. 



1998 



2001). 



The results for this model are shown in Table [2] they are similar to those in Table [T] The main difference 
here is that the 95% HPD intervals for Oo and Tj are now smaller and more concentrated around the posterior 
means, which represents an improvement on those given in Table Q] 

An important diagnostic tool in MCMC modelling is the autocorrelation plot of the parameter of interest. 
Figures [3] and |5] show the autocorrelation functions for the parameters of interest in our model. For the full 



8 



Table 2. Posterior means and 95% HPD 
Intervals for the model parameters, after in- 
tegrating out the variance parameters with 

a \ — a 2 = b\ — £>2 = 0.001. 



Parameter 


Estimate 


95% HPD Interval 


ao 


0.01597 


(-1.1332, 1.0953) 


a. 


0.02551 


(-0.0312, 0.0830) 


a 2 


0.02439 


(-0.0204, 0.0706) 


a 3 


0.03987 


(-0.0063, 0.0842) 




0.02703 


(-0.0161,0.0705) 


a 5 


0.03630 


(-0.0086, 0.0781) 


«6 


0.03654 


(-0.0072, 0.0776) 


a? 


0.02942 


(-0.0169, 0.0777) 


P 


0.21410 


(-0.5585, 1.1177) 


i] 


0.02775 


(-0.4214, 0.4264) 



implementation the autocorrelation values are essentially zero at lags greater than 5, except for p where lags 
up to 25 are large. The implementation with the inverse-variance parameters integrated out does not appear to 
be better than the full implementation. The autocorrelations for this implementation are bigger than those of 
the full implementation at all lags and are significant up to lag 10, excepting for p which has autocorrelation 
significant up to lag 15. The autocorrelation plots and trace plots for the implementation with the variance 
parameters integrated out are shown in Figures|6]and|5] respectively. This could be indicating poor mixing of 
the Metropolis algorithm due to the complexity of the terms in Equation (01 and small Metropolis acceptance 
rates. The small Metropolis rates could also be indicating that the proposal variances need to be smaller so 
that proposed values are closer to the current values and will have a greater chance of being accepted. 

These results suggest that fitting models with p = or p = 1 should give better description of the data, 
i.e. p = and p = 1 might be plausible alternative models. In the remainder of this paper we examine in 
more detail the simplified models with p = and p = 1, and how to discriminate between them. The model 
with p = means that a simple variance components model will be enough to describe the data whereas the 
model with p = 1 means a simple autoregressive model with Normal errors will be adequate to describe the 
data. 



4. The Reversible Jump Algorithm 



In this section we discuss trans-dimensional algorithms. The algorithms we have discussed before, notably 
the Metropolis-Hastings algorithm and the Gibbs sampling algorithm, cannot be used to simulate Markov 
chains where the dimension of the state vector can change at each iteration. This situation arises particularly 
in model selection problems where there are competing models, and where the size of the parameter vector 
is allowed to vary between models. 

In this context the distribution of interest is defined jointly over both parameter and model space. Sev- 
eral authors have proposed simulation methods to construct Mark ov chains which c an explore such state 
spaces. These include th e product spac e formulation given in lCarlin and Chibl dl995h. the rev e rsible jump 



(RJM CMC) algori t hm of iGreenl d 1995b . the jump diffusion method of Grenander and Miller] d 1994 . and 
Philli ps and Smith] (1996) and the continuous time birth-death method of lStephensI (120001) . Also for partic- 
ular problems involving the si ze of the regression vector in regression analysis there is the stochastic search 
variable selection method of iGeorge and McCullochl (119931) . In the remainder of this section we describe the 
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Fig. 3. Autocorrelation plots for p, rj and a. 
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Fig. 4. Trace plots of model parameters. 
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Fig. 5. Autocorrelation plots for p, 77 and a with inverse-variance parameters a and t integrated out. 
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Fig. 6. Trace plots of model parameters, after integrating out the inverse-variance parameters. 



Bayesian Analysis of Loss Ratios Using the Reversible Jump Algorithm 1 3 



reversible jump method of 



Green 



( 1995). In practice trans-dimensional algorithms work by updating model 
parameters for the current model then proposing to change models with some specified probability. 

The Reversible jump algorithm represents an extension of the Metropolis-Hastings algorithm. We assume 
there is a countable collection of candidate models, indexed by M £ j$ = {M\, M2,. . . , M^}. We further 
assume that for each model M,-, there exists an unknown parameter vector ; - 6 W' where n,-, the dimension 
of the parameter vector, can vary with i. 

Typically we are interested in finding which models have the greatest posterior probabilities and also 
estimates of the parameters. Thus the unknowns in this modelling scenario will include the model index M, 
as well as the parameter vector 0,. We assume that the models and corresponding parameter vectors have 
a joint density 7f(M,-, 0,). The reversible jump algorithm constructs a reversibl e Markov chain on the state 
space x Um,g^^"' which has n as its stationary distribution ( !Greenlll995h . In many instances, and in 
particular for Bayesian problems this joint distribution is of the form 



n{M u Si) = 7i (Mi, 6i\X) - L(X\Mi, Qi) p(M u 0,-), 



where the prior on (Mi, 0, ) is often of the form 



p(M i ,9 i )=p(9 i \M i )p(M i ) 



with p(Mi) being the density of some counting distribution. 

Suppose now that we are at model M, and a move to model Mj is proposed with probability r,;. The 
corresponding move from 0, to 9 j is achieved by using a deterministic transformation hy, such that 



(8j,v) =hij(8i,u), 



(4) 



where u and v are random variables introduced to ensure dimension matching necessary for reversibility. To 
ensure dimension matching we must have 

dim(0j) + dim(v) = dim(0 ( ) +dim(t<). 



For di scussions about possible choices for the function we refer the reader to lGreenl (1 1 9951) . and 
J2OO3L Let 



Brooks et al 



A{6 t ,6j) 



n{M h dj) g(v) rjj 
%(Mi, 0,) q(u) rij 



dhij(6i,u) 



d{9„u) 



(5) 



then the acceptance probability for a proposed move from model (M,-, 0,) to model (Mj, dj) is 

min{l,A(fl,,fl y )} 



where q(u) and q(v) are the respecti ve proposal de nsities for u and v, and |<9/i, ; (0, , m) /d(9i,u) | is the Jacobian 
of the transformation induced by hij. Green ( 1995) shows that the algorithm with acceptance probability given 
above simulates a Markov chain which is reversible and follows from the detailed balance equation 



n{Mi, 9i)q(u)rjj = %(M h 9j)q(v)rjj 



dhjj(9i,u) 



d(9i,u) 



14 



Detailed balance is necessary to ensure reversibility and is a sufficient condition for the existence of a unique 
stationary distribution. For the reverse move from model Mj to model M, it is easy to see that the transforma- 
tion used is (9i,u) = hjj l (9j,v) and the acceptance probability for such a move is 



min < 1 



n(Mj, Gj) g(u) ry 



dhij(6i,u) 



d(0i,u) 



min {I, A(Gi, 6 j)- 1 }. 



For inference regarding which model has the greater posterior probability we can base our analysis on a 
realisation of the Markov chain constructed above. The marginal posterior probability of model M, 

/ , \ p(Mi)f(X\Mi) 
7c(Mi\X) - ' ' 



J LM j e.*P{M j )f{X\M j y 
where 



f(X\Mi) = J L{X\Mu9i)p{9i\Mi)d9i 



is the marginal density of the data after integrating over the unknown parameters 9. In practice we estimate 
n(Mj\X) by counting the number of times the Markov chain visits model M, in a single long run after reaching 
stationarity. These between model moves described in this section are also augmented with within model 
Gibbs updates as given in Section[3]to update model parameters. 



4.1. Efficient Proposals 

In practice the between model moves can be small resulting in poor mixing of the resulting Markov chain. 
In this section we discuss recent attempts at improving between model moves by in creasing the acceptance 



probabilities for such moves Seve r al authors have addressed this probl em includin g Troughton and Godsill 



1997) 



(2001 



Giudici and Roberts ( 1998), Godsill (2001), Rotondi (2002), and 



Al-Awadhi et al 



(2004). 



Green and Mira 



) proposes an algorithm so that when between model moves are first rejected, a second attempt is made. 
This algorithm allows for a different proposal to generated from a new distribution, that is allowed to depend 
on the previ ously rejected pro posal. Methods to i mprove mixing of reversible j ump chains have a l so bee n 



proposed by 



Green 



(2002J) and 



Brooks et al 



A g eneral strategy proposed by 



(2003), which has been extended bv lEhlers and B rooks (2002). 



Brooks et al 



(120031) and extended to more general cases by 



Ehlers and Brooks 



(2002) is based on making the term Ay (0,-, 9 ,) in the acceptance probability for between model moves given 
in Equation © as close as possible to 1 . The motivating reason for this is that if we make this term as close 
as possible to 1 the the reverse move acceptance governed by 1 /Ay(0,-, 9j) will also be maximised resulting 
in easier between model moves. In general, if the move from (M, . 0,) (Mj, 9j) involves a change in di- 
mension, the best values of the parameters for the densities q(u) and q(v) in Equation (0 will generally be 
unknown, even if their structural forms are known. Using some known point (k,v), which we call the cen- 
tering point, we can solve A i; (0,, 9 ;) = 1 to get the parameter values for these densities. Setting A, ; - = 1 at 
some chosen centering point is called the zeroth-order method. Where more degrees of freedom are required 
we can expand A, ; - as a Taylor series about (u,v) and solve for the proposal parameters. For the methods we 
use in this paper the new parameters are proposed so that the mapping function in Equation (0]i is the identity 
function, i.e., 

(9j,v)=hij(9i,u) = (u,9i) 
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and the acceptance ratio term Ajj(9i, 9j) probability in Equation <(5j becomes 



15 



Aij(e,,ej) 



n{M h dj) rji g{v) 
n(M u 9i) nj q(u) 

n{M u 6i) rtjqiejY 



4.2. Convergence Assessment 

Conve rgence assessment for trans-dimensional algorithms are still in their infancy. 



Brooks and Giudici 



( 1999) propose to run I > 2 chains in parallel and base their convergence diagnostic on splitti ng the total 



Brooks et al. 



variation not just between chains but also between models. Their method was extended by 
(2003) to include non-parametric techniques, including chi-square tests, Kolmogorov-Smirnov tests and di- 
rect convergence rate estimation. The latter being similar to the ideas of Raftery and Lewis (119921) for the 



fixed dimensional Metropolis- Hastings or G ibbs algorithms. ICastelloe and Zimmermanl (120021) also develop 
methods based on the ideas of iBrooks et al.l (120031) which can be used only where the parameters have the 



same interpretation across all models. 



Brooks et al 



( 2003) suggest several methods for assessing convergence within the context of model selec- 
tion problems. In particular for reversible jump algorithms we can have some idea of how fast the simulations 
approach stationarity by comparing the empirical stationary distribution on the observed model orders. They 
propose specific test statistics based on the ^-square distribution and also a Kolmogorov-Smirnov test for 
goodness of fit. The ^-square and Kolmogorov-Smirnov compare the stationary distribution of each chain 
and computes p-values for the computed test statistics. A critical value of 5% is used so that if the ^-square 
or Kolmogorov-Smirnov statistic is above this si gnificance level the re is no reason to reject the chains as not 
being from the same stationary distribution. See 



Brooks et al. 



d2003l) for further details. 



5. Model Selection Using Reversible Jump Algorithms 

In this section we introduce two additional models and describe a reversible jump model selection technique 
to discriminate between them. Denote the full model in Equations (fl} and (O with Mi, we introduce two 
additional models, which are sub-models of M\. The second model, M2, has p fixed at 1. For this model 
there is no tj and the first two levels are 

The prior distribution on ao, (7 and T remain as in M\ . The posterior conditionals are exactly the same as in 
Section [3] simplified with p = 1 where necessary. The third model, M3, has p fixed as well, however this 
time at 0, which results in a simple random effects model: 



Rj\a'j,o" -^(c^K^)- 1 ) 
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The prior distributions on rj, a and T remain as in M\. Again the posterior conditionals are as those in 
Section[3]with p = where necessary. 

The computation here is a simple extension to the Bayesian posterior distribution described in Section[3] 
above. Here we have model space ^={M\, M2, M3} with three models, where M\ is the original model 
described in Equations (Q]l and @. Models M2 and M3 correspond to the simplified models with p fixed at 1 
and 0, respectively. We can extend our posterior distribution to consider both parameter and model space by 
taking as our posterior for model M\ 

n(M u a,T,a,p,ri\R n )ocL{R"\a,a) P (a\p,ri,T,a ) P (a) P {z)p(p)p{ri) P (a( ) )p(M l ). 

For the simplified models M2 and M3 the posteriors defined up to the constant of proportionality are 

7t(M 2 , a', T?,o'\R n ) « L(R»\a>, a') P {a'\x', c^)p{a')p{x') P {c^)p{M 2 ), 

and 

tt(M 3 , a", t", a", 7}"\R n ) « L(R"\a", a")p(a"\r,", t") P (o")p(t") P (t]")p(M 3 ), 

respectively, where p(M{) is some discrete prior distribution on the model space „#. Posterior model prob- 
abilities may then be obtained by marginalisation i.e., integrating out a, OCq, p, 77, a and T to obtain the 
posterior marginal for M, given the data. For the implementation we start with each model having equal prior 
probability 

p{M l )=p{M 2 )=p{Mi) = \, 
and ry the probability of proposing a move to model Mj when at model M, taken to be j for i, j = 1,2,3 and 

In the discussion that follows for ease of notation we suppress the dependence of the densities on the 
parameters a = ((Xi , . . . , 0£„), a, and T since these parameters are common to all models. In addition for our 
reversible jump moves these common parameters are kept fixed between models. 



5. 1. Pilot Tuned Methods 



Consider a proposed move from to (M\,ao,p,r]), we need to increase the dimensionality of the 

parameter vector by adding three components Oo, p and 77 and removing (Xq. To achieve this we simulate u\, 
U2 and M3 from densities q(u\), q{uj) and q(u^), respectively, and set 

(0£b,P, TJ,v) =/l2l(0^,«i,M2,«3) = («l,M2,«3,«o), 



where the variable v is needed to ensure dimension matching and reversibility. We further assume v has some 
density q{v), which we use to simulate values of v for the reverse move from M\ to Mi- The acceptance 
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probability for such a move is then min{ 1 , A21 } where 



7t(M u OQ,p,ri) 

At] = ; 7t X 

n{M 2 ,oQ 
n{Mx,OQ,p,r\) 



n(M 2 ,a' ) 
n{M u aQ,p,T}) 
n{M 2 ,aL) 



q(ui)q(u 2 )q(u 3 ) 

q(u\)q(u 2 )q(u 3 ) 

g(«o) 
q{ao)q{p)q{T)Y 



dh 2 i(cCQ,ui,u 2l ui) 



<5(«o,Ml,«2,«3) 



(6) 



since the Jacobian term 



dA2i(aQ,»i,»2,»3) 



evaluates to 1 . 



The densities q(ao), q(p), q{f]) and q((Xo) are all assumed to be Gaussian densities, with respective 
parameters {m\,0\), (m 2 ,a 2 ), (m 3 ,a$) and (m v ,o~ v ). Theoretically, we can choose arbitrary values for the 
location parameters m\, m 2 , mj, and m v and for the scale parameters d\, G 2 , 03 and <7 V . However, some 
choices will result in an algorithm which takes longer to reach stationarity, since poor choices will result in 
low acceptance rates for between model moves. We fine-tuned the between model transitions by trying several 
different choices for these quantities and all resulted in the same posterior model probabilities. Generally, 
picking m\ and G\ close to the posterior marginal mean and variance for Oo; m 2 and o~ 2 close to the marginal 
posterior mean and variance of p; m?, and 03 close to the marginal mean and variance of Tj ; m Y and a v close 
to the posterior marginal mean and variance of (Xq results in an algorithm where between model jumps are 
easier. We determined these posterior values by running each model in turn and recording posterior estimates 
of the mean and variance of the model parameters. These estimates are then used as proposal parameters 
in the reversible jump implementation. This scheme can only be used when there are a small number of 
candidate models as it becomes infeasible when the number of candidate models is large. In Section I5.2l we 
propose to use an automatic sampler which can choose location and scale parameters to maximise between 
model transitions based on methods presented in lBrooks et alj (2003). 

The reverse move from (Mi,Oo,p,Tj) to (M 2 ,01q) is achieved by simulating v from density q(v) then 
setting 

(o^,mi,m 2 ,«3) = h 2 i{aa,p,r],v) = (v,a ,P,i?) 

for which the acceptance probability of accepting this dimension changing move is then vafrn\\,Az}} where 
A21 is given in Equation (|6]). 

The description is similar for moves between models M\ and M3. Assume we are at model (M3, tj") and 
a move to model M\ is proposed. We simulate u\, u 2 , M3 from densities q(u{), q{u 2 ) and q{u 3 ) respectively 
and set 

(O0,p,T],w) =/l3i(T7",Mi,M2,M 3 ) = (ui,U 2 ,U3,T\"), 

where w is introduced to ensure dimension matching. 

The probability of accepting this move is then min{l,A3i} where 
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7t(M h rxQ,p,ri) 



q(w) 



7T(M 3 ,7]") q(ui)q(u 2 )q(u 3 ) 



dh 31 (ri",u l ,u 2 ,u 3 ) 



d(r\" ,u\,u 2 ,u 3 ) 



(7) 



For reasons similar to those given above q(u\), q(u 2 ), q(u 3 ) and q(w) are densities approximating the poste- 
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Table 3. Parameter Estimates and 95% HPD Intervals. The 
corresponding results for the full model are given in TableQ] 





M 2 


95% HPD Interval 


M 3 


95% HPD Interval 


«0 


0.0252 


(-0.0586, 0.1092) 


_ 




(*! 


0.0253 


(-0.0185, 0.0700) 


0.0275 


(-0.0145, 0.0697) 




0.0253 


(-0.0129, 0.0648) 


0.0244 


(-0.0170, 0.0666) 


a 3 


0.0368 


(-0.0015, 0.0744) 


0.0403 


(-0.0024, 0.0818) 




0.0292 


(-0.0073, 0.0664) 


0.0261 


(-0.0143, 0.0670) 


«5 


0.0358 


(-0.0004, 0.0720) 


0.0359 


(-0.0048, 0.0754) 




0.0362 


(-0.0003, 0.0726) 


0.0361 


(-0.0032, 0.0747) 


a 7 


0.0304 


(-0.0127, 0.0740) 


0.0288 


(-0.0127, 0.0706) 


n 






0.0313 


(-0.0014, 0.0636) 


a 


1145.7 


(0.18, 2884.4) 


1115.2 


(1.46, 2695.4) 


T 


1460.8 


(35.6, 3359.3) 


1617.1 


(53.7, 3783.7) 



rior marginals of On, p, rj and T]", respectively. The reverse move from (M\ , Oo,p, 77) => (M3, 77") is achieved 
by simulating w with density q(w) and setting 

(T]",u l ,U2,u 3 )=h^(a ,p,T],w) = (w,ao,p,ri) 

for which the acceptance probability is the minjljA^ 1 } where A3 1 is given in Equation[7] 

For a proposed move from (M 2 ,(x'q) to (M3,tj"), we simulate w with density q(w) and set (r\" ,v) — 
hz3 (c<o > w ) = ( w > ' ) ■ F° r sucn a P ro P osa l the acceptance probability is min{ 1 , A23 } where 

dh 32 (al ) ,w) 

where q(w) and g(v) are the densities discussed above. Again the Jacobian for this proposed move is 1 since 
the transformation from (M 2 , OCq) to (M3, rj") is the identity function. Notice that with this move we are not 
changing the number of parameters, but swapping 0^ for T]" . The acceptance probability for the reverse move 
is then minjljA^ 1 }. 



5.1.1. Simulation Study 

To test how well the model discrimination scheme works we simulated several datasets and applied the 
algorithm to them. In all cases where data were simulated from model M 2 the algorithm placed the largest 
posterior probability on that model, like with data simulated from model M3 the algorithm placed the highest 
posterior probability on that model. For data simulated from model M\ in some instances the highest posterior 
probability is placed on either model M 2 or model M3. As the value of n increases, it appears as though the 
algorithm will place most of the posterior probabilities on either M 2 or M3 since for large values of n the 
values of Rj simulated approach 77 asymptotically, hence the smaller models M 2 and M 3 offer a better fit to 
the data. 



7r(M 3; T7") q ( v ) 

A 23 = —rr. 7Y X ~rr x 

7r(M 2 ,a^) q(w) 



5.1.2. Model Averaged Results 

The posterior parameter estimates with 95% HPD intervals for each of the three models are given in Table [3] 
The posterior model probabilities are shown in Figure [7] this shows that model M 2 has the greatest posterior 
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0.439 




Mi M 2 M 3 



Model 



Fig. 7. Posterior model probabilities for models Mi, M2, and M3. 



M 3 - 1 




200 400 600 800 1000 

Iteration 



Fig. 8. Trace plot of the model indicator. 
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probability of 0.495, followed by M3 with probability 0.439. The full model Mi has the least posterior 
probability, 0.066. The posterior model probabilities of M2 and M3 seem to contradict the results if we 
consider the posterior distribution of p. Figure|2] shows that the posterior density of p clearly has most mass 
around the node p = 0, so we might expect model M3 to have the greater posterior probability. 

It is interesting to note that many of the parameter estimates are similar under all three models. In 
particular the error variances seem to take very similar values under all three models. Thus model-averaged 
estimates look very similar from those derived from just a single model for this example. Note also the 
posterior distribution for p in the full model has a posterior mean of 0.220. This might naively be interpreted 
as suggesting that the ratio of model probabilities between Model M2 and M3 should be roughly 1 : 4 rather 
than the 1 : 1 ratio observed. The posterior density of rho is shown in Figure [2] 

Figure [8] shows the mixing of the deterministic proposal reversible jump algorithm. It is noticeable that 
even though models M2 and M3 have approximately equal posterior probabilities the algorithm does not mix 
very well. In the next section we set try to improve the mixing of the reversible jump algorithm. 



5.2. Automatic Proposal Choices 

The choice of proposal densities in the reversible jump MCMC implementations given in Section [5] are 
determined by doing a pilot run to determine good parameter choices to describe these densities. In this 
section we show how this process can be made more automatic by proposing an adaptive scheme where 
the proposals are chosen as to maximise the probability of between model moves. Automatic proposals 
are desirable for a number of reasons, mainly because they reduce the ne ed to do trial runs in order to get 



parameter estimates for proposal densities. The method we use is based on lBrooks et al.l d2003l) and uses the 
idea of so-called weak non-identifiability and centering to determine the choice of proposal densities which 
maximises the probability of between model moves. The weak non-identifiability centering point is a choice 
of parameter values which essentially reduces the more complex model to the simpler model. We refer to 
this new implementation as the efficient proposals method and the previous implementation in Section [5] as 
the vanilla implementation. In the remainder of this section we show the details of how the between model 
moves are implemented. 



5.2. 1 . Moving between Models M\ and M2 

Consider a move from model M2 to model M\. The acceptance probability for such a move is min{l,A2i}, 
where 



A 2 i = 



7r(Mi,oco,p,T7) q(cQ 



(8) 



7t(M 2 X) q(ao,p,ri)' 

An ideal choice for q(ao,p, 77) would be n(ao,p, J]\M\), the conditional posterior for (ao,p, 77) given M = 
M\. This density is non-standard, furthermore we would also need to know its normalising constant to 
compute the ratio A%\. We cannot sample directly from this density, but instead we approximate q(ao,p,rj) 
with a trivariate normal density. We approxi mate q(c&) using a Gaussian density whose parame ters we derive 



below. Similar methods have been proposed ( Carlin and Chib , 



1995 



Madigan and York , 



1995). 



The best approximating de nsity for gfoq , p,T?) in our case is one that will maximise A%\. To do this 



we use the A^-order method of 



Brooks et al. 



(2003) and expand A21 as a Taylor series around some point 



(c£o,p\ fj) which they call the centering point. Since we need only to estimate the mean and variance of this 
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trivariate normal density, partial derivatives of order 1 and 2 will suffice. Essentially this means solving 



d(«D,p,T/)* 



121 



= 0, k=l,2 



for the mean vector and covariance matrix for the density q(ao,p,r]), where (ocq , p , f] ) is our chosen centering 
point. However it is usually much easier to do computations with the log of A21, in which case we solve 



logA 2 i 



diocp^f 



=0, £=1,2. (9) 

.ao.p.i?) 



With A21 as given in (O it is not very difficult to see that when we take derivatives of A21 (or logA2i) 
with respect to ( Oo , p , 77 ) the terms involving % (M2 , (Xq ) and q ( Og ) will contribute nothing to that derivative 
and likewise when we take the derivative of A21 or (logA2i) with respect to a' the terms n(M\ , Oo,p, tj) and 
q(ao,p,T]) will contribute nothing to that derivative. In what follows we will ignore terms where appropriate. 
Thus we can compute the first and second partial derivatives of log A21 as 

4^^ = ^^( l0g ^ (Ml ^' P ^ )_1 ° g ^' P ^ )+ ^ (10) 



and 

d 2 logA 21 d 2 



log7t(M l ,a ,p 1 ri)-logq{aQ 1 p,ri)+K2) 1 (11) 



d(ao,P,T?) 2 <?(ao,P,J7) 
where the term K2 = —logn(M2,(Xo) + logg(o!o) is independent of (ao,p,rj). Also we can expand the 
posterior density of (Mi , Oo , p , 77 ) 

n(M u ao,p,ri)ocL(R"\a,a) P (a\p,ri,ao,z)p(ri)p(p)p{(Xo)p{a)p(T)p{Mi) 

and the proposal density for (ao,p,T7) 

q {ao, P ,n) ~ \E\-W~p{-1 (( P°) - M ) V' (( ?) 

the density of a trivariate normal distribution with mean vector jj. and covariance matrix E. Setting ([Tol l and 
( fTTT i equal to zero at the point (ao,p, fj) we get two equations which can be solved simultaneously for the 
variance matrix E and mean vector ji. Solving simultaneously we can easily see that the variance matrix E is 

E~' = 

/ l+Tp 2 -T(a 1 -i7+2p(i7-a )) -Tp(l-p) \ 

- T (a I -7 ) +2p(77-o )) l+tlfj =1 (n-a^if _ T £« =1 [(i_ 2 p)(ij_a / _ 1 )+^_a / ] ( 12 ) 

V -tp(l-p) -Tl» =1 [(l-2p)(fj-a,_ 1 )+77-a ; ] l+nz(\-p) 2 J 

and the mean vector jti satisfies 

fib-/>T(ai-pfib-(i-p)ij) 



p ) -n ) = P+^)=i[n-a h x\[{rt-a^i)p+aj-fi\ 

\\nj J \ n-<i-p)i. n j=i[oij-poij-i-(i-p)n] 
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which results in the estimate 



1" 



Oo-pTt^-poo-tl-p)?;) 
P>T£" =1 ffi-aj-i][(f)-aj_i)p+aj-fj] 

fi-T:{i-p)L"j =[ [aj-pcCj- l -{l-p)fl] 



(13) 



A difficulty arises however, since the above inverse variance matrix is not guaranteed to be positive definite 
(symmetric yes!) as the elements are random. Essentially, this means that the derivatives are not zero within 
the range of positive definite matrices, E . On average in this implementation E fails to be positive definite 
every 16 iterations. Our approach will be to use ([T2| > when it is positive definite. 

In cases where ( TT2T > is not positive definite we force the off-diagonal elements to be zero. Note that 
forcing the off-diagonal elements to being identically zero reduces our proposal from being a trivariate normal 
to being a product of three univariate normals. There are two possible centering points if the off-diagonal 
elements are set to 0, corresponding to p = or p = 1 . We pick the one corresponding to p = 1 since M 2 is a 
sub-model of Mi with p identically equal to 1 . Also with p = 1 fixing the off-diagonal elements at dictates 
that ao = 0:7 and f] = 2oc 7 - a,\ . 

To get the parameters for the density q(a' () ) we simply use the conditional posterior of Oo given M = M 2 . 
This density has mean (1 + t) _1 (tg!i) and variance (1 + t) -1 . This choice can be shown to be optimal in 
terms of maximising the acceptance probability for proposed moves and also satisfies the k' h -order equations 
(0. To see this, we expand 

n(M2,a^)ocL(R n \a',a') P (a'\al ) )p(al ) )p(&)p(T , )p(M2) 



and supposing that q(a' Q ) ~ Jf (jU^Vq), we compute the equations 



da. 



7logA 2 i 



t-J (l°g?(«o) - logrc(M 2 , Oq) +Ki 



= 0, 



logA 
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d 1 



-7T2 (log^K) - log7r(M 2 , ad) + Ki 

(Xrx ) \ 



= 0. 



The term K\ = — log^(Mi,p.Oo,Tj)+log^(p,ao,rj) is independent of the parameter of interest o^. Solving 
simultaneously leads to the estimates fi^ = (1 + T) _1 (rai) and Vq = (1 + t)~' for the mean and variance of 
the proposal distribution. These values are independent of the centering point C(q chosen. 

Note that when p = 1 the new value of r\ is simulated from the prior density of 77, likewise when p = 
OCq is simulate d from t he pri or d ensity on Op. This is a form of the birth death method for reversible jump 
algorithm. See 



Green 



d 1995b and 



Brown 



(2004, Chapter 8). 



5.2.2. Moving between Models Mi and M3 
Consider the ratio 

A = n(Muao,P,ri) qjjf) 
31 n(M 3 ,-n") «(c*>,P,t?)' 

notice that in taking logs and differentiating with respect to (ofo, p , tj ) we remove all terms involving M 2 and 
7]' . For this reason the expressions given for the inverse variance matrix and mean vector for a proposed move 
of type M3 to Mi are exactly the same as those given in Equations (fT2l and (fl~3b . The principal difference 
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is that since model M3 is a sub-model of M\ with p identically equal to 0, we choose a centering point with 
p = 0. Also whenever the proposed variance matrix is not positive definite we again force the off-diagonal 
elements to be zero which forces Oo = 2na,\ — 2£" =1 OCj + OCj and 77 = 0C\. Likewise the parameters for the 
proposal density q(j\") which maximises A31 can be shown to be the posterior conditional mean of 77" and 
the posterior conditional variance of r\" given that M = M3. This density has mean (1 +«t)~ 1 (t£" =1 af) 
and variance (1 +/it)~ 1 . 



5.2.3. Moving between Models M 2 and M 3 

For a move between models M 2 and M3 there is no change in the size of the parameter vector. The acceptance 
probability for such a move is min{l,A3 2 } where 



A32 = 



K{M 2 ,a' a ) q (r}") 
Tt{Mi,r\") q{aQ 



We use Gaussian densities for the proposals q{f]") and q(oc^). Solving 



= 0and 



<?(77") 2 



logA 32 



= 



simultaneously shows that q(T}") has mean (1 +nr) _1 (t£" =1 OLj) and variance (1 + nt) . The reader will 
notice at once that these quantities are the conditional posterior mean and variance of rj" given M = M3. 
Similarly solving 



odd. 



Oand 



d(a>Y 



logA 32 







simultaneously shows that q(a' () ) has mean (1 + t) 
posterior mean and variance of given M = M 2 . 



1 {x(X\ ) and variance (1 + t) , which are the conditional 



We can summarise this by saying that q{v\") = 7z(t]"\M^) and q(u' a ) = n(a.Q\M2) are the proposals which 
will maximise the acceptance probability for proposed moves between models M 2 and M3, and that these 
choices are independent of the centering point chosen. In this case the ratio A3 2 reduces to 



A32 = 



K(M 2 X)gW) 
7r(M 3 ,T7") q(oQ 

_ K(M 2 ,oj) %{r]"\M 3 ) 

7c(M 3 ,r}") %{a' \M 2 y 

In our simulations using this term should increase the between model moves. This was observed in our 
simulations as all proposed moved from model M3 to model M 2 were accepted, whereas for the vanilla im- 
plementation such moves were accepted with probability 0.498. Similarly a proposed move from model M 2 
to model M3 is accepted with probability 0.930 when the posterior conditionals are used as proposals, im- 
proving upon the 0.440 probability obtained with the vanilla implementation. The empirical resul t s obse rved 
here are actually specific cases of more general results which can be found in Ehlers and Brooks! (12002b . 
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Fig. 10. Trace plot of the model indicator, second reversible jump implementation. The horizontal axis shows 
the iteration number and the vertical axis shows the model indicator. 
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Fig. 11. Convergence diagnostics for the vanilla implementation. The horizontal axis times 1000 gives the 
iteration number. 




Fig. 1 2. Convergence diagnostics for the automatic proposals implementation. The horizontal axis times 1 000 
gives the iteration number. 
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5.3. Comparing the Model-move Schemes 

The empirical transition matrices for the vanilla reversible jump method, p val \ and for the second order 
method, P eS , are, respectively 





Mi 


M 2 


M 3 






Mi 


M 2 


M 3 


Mi 


f 0.703 


0.154 


0.142^ 




Mi 


f 0.597 


0.121 


0.281 \ 


P van = M 2 


0.020 


0.758 


0.220 


and P eS = 


M 2 


0.018 


0.516 


0.465 


M 3 


^0.021 


0.249 


0.729 J 




M 3 


^0.043 


0.501 


0.456 y 



The empirical transition matrices are computed by setting the (/, j')-element equal to the proportion of times 
the model indicator M, follows the model indicator M, for one long run of the reversible jump algorithm, in 
this case for 1000000 iterations. 

They matrices clearly that between model (off-diagonal) transitions have increased for P eS the transition 
matrix for the efficient proposals method, except between models Mi and M 2 where there were small de- 
creases. To assess convergence of the algorithm, we simulated 3 chains using different starting values and 
different random number seeds for a total of 1000000 iterations. In Section |4~2l we introduced two methods of 
assessing convergence of reversible jump chains. Both the ^-square and Kolmogorov-Smirnov diagnostics 
are used to assess convergence of our simulations. These diagnostics are plotted in FiguresQ~T]and Q~2]for the 
vanilla reversible jump algorithm and efficient proposals implementations, respectively. Clearly the efficient 
proposals implementation performs better than the vanilla implementation 

We summarise by giving the efficient proposals results applied to the models discussed in Section [5] and 
compare them with those obtained using the vanilla reversible jump algorithm using the fine-tuned proposals 
described in Section [5] We end this section by briefly addressing convergence issues. The posterior model 
probabilities are shown in Figure|9] the posterior model probabilities are similar to those obtained in Section|5] 
Model Mi has posterior probability 0.069, M 2 has posterior probability 0.482 and M3 has posterior probability 
0.449. While the computing effort required to implement this model is a bit greater than that required for 
the vanilla reversible jump method, the improved mixing can also be seen by comparing Figures [51 and [TOl 
Figure [10] shows that the algorithm jumps between models more frequently for the second implementation 
compared with the fine-tuned proposals implementation shown in Figure [8] The within model parameter 
estimates are almost identical to those obtained using the implementation in Section [5] and are not tabulated 
here. The minor differences we attribute to Monte Carlo errors. 



6. Summary 

The reversible jump algorithm is presented as a method of computing posterior model probabilities in a 
Bayesian setting. The vanilla reversible jump algorithm although theoretically sound has some implemen- 
tational problems. One such problem is the choice of mapping function, another is the choice of proposal 
density parameters. In this paper we have shown how recent methodological advances in reversible jump 
computing can be applied to model selection problems. This is particularly useful for actuarial practitioners 
where the most appropriate choice of model is important. 
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